Problems with Obesity, specifically child obesity
“There is concern about the rise of childhood obesity and the implications of such obesity persisting into adulthood. The risk of obesity in adulthood and risk of future obesity-related ill health are greater as children get older. Studies tracking child obesity into adulthood have found that the probability of overweight and obese children becoming overweight or obese adults increases with age. The health consequences of childhood obesity include: increased blood lipids, glucose intolerance, Type 2 diabetes, hypertension, increases in liver enzymes associated with fatty liver, exacerbation of conditions such as asthma and psychological problems such as social isolation, low self-esteem, teasing and bullying.”
The aims of this project are:
Identify the most influencing factors for childhood obesity for children in in Year 6 (age 10/11 in 2017/18)
Childhood obesity in 2017/18 ranges from 11.37% in Richmond upon Thames to 29.66% in Barking and Dagenham.
Determine if different factors effect the prevalence of obesity at Reception for the same cohort of children (age 5/6 in 2010/11) within the same local area
In 2010/11, the obesity levels for Reception pupils was 9.44%. In 2017/18, when the same cohort of pupils were in Year 6, the measured levels of childhood obeisty had increased to 20.14%
I have chosen to do this on local authority level as this is gives 152 different areas to model whilst ensuring there is still ample data available at this denomination. Local authorities are also responsible for local schools and allocating spending within the area so give a good variation of different spending allocations/policies the findings could be used to explore.
# Viewing the indicator ID and area type ID for all the fingertips data
indicators <- indicators()
areatypes <- area_types()
All of the indicators can be seen using (View(indicators)), it is then easy to search for any specific indicator
# Use this function to view the indicator name for any indicator ID
ind <- function(a) {
g <- as_tibble(indicator_metadata(a)[, 2]) %>%
filter(row_number() == 1) %>%
pull(Indicator)
cat(g)
}
# Use this function to view the indicator definition for any indicator ID
def <- function(a) {
g <- as_tibble(indicator_metadata(a)[, 3]) %>%
filter(row_number() == 1) %>%
pull(Definition)
cat(g)
}
#' Use this function to extract the Value and Time period data for a specific indicator ID and area code.
#' @param a Indicator ID number
#' @param b Area Type ID number
#' @return function returns a dataframe containing the Area Name, Indicator Value and Time Period
datacollection <- function(a, b) {
e <- fingertips_data(IndicatorID = a, AreaTypeID = b)
e <- e %>%
filter(AreaType == "County & UA") %>% # selects all data with County and UA as its area
select(AreaName, Value, Timeperiod, IndicatorName) # selects only the columns of interest
colnames(e)[colnames(e) == "Value"] <- e[4, 4] # renames the column name value with the Indicator Name
e <- e[-4] # removes the indicator name column
# simplifying the names
names(e) <- names(e) %>%
{gsub("\\s*\\([^\\)]+\\)", "", .) } %>% # removes any text in brackets
{gsub(", ", "", .) } %>% # removes commas
{gsub("- ", "", .) } %>% # removes -
{gsub("\\s+$", "", .) } %>% # removes spaces at the end of name
{gsub(" ", "_", .) } %>% # replaces all spaces with underscores
{gsub(":", "", .) } %>% # removes colons
{tolower(.) } # makes names lowercase
e
}
# Use this function to select a specific time frame
dateselector <- function(a, b) { # takes data and a date
d <- a %>%
filter(timeperiod == b) # filters the data to only show the required time frame
d <- d[-3] # removes the TimePeriod column
}
These are the variables I would like to model:
Reception: Prevalence of obesity (including severe obesity) measured by the BMI greater than or equal to the 95th centile of the UK90 growth reference among children in Reception (age 4-5 years)
# Reception: Prevalence of Obesity
# (Indicator ID:90319)
Reception_Obesity <- datacollection(90319, 102) # selecting the data
Reception_Obesity[2] <- round(Reception_Obesity[2], 10) # rounding data
Reception_Obesity <- dateselector(Reception_Obesity, "2010/11") # selecting only 2010/11 values
Year 6: Prevalence of obesity (including severe obesity) measured by BMI greater than or equal to the 95th centile of the UK90 growth reference among children in Year 6 (age 10-11 years)
# Year 6 Prevalence of Obesity
# (Indicator ID:90323)
Y6_Obesity <- datacollection(90323, 102) # selecting the data
Y6_Obesity[2] <- round(Y6_Obesity[2], 10) # rounding data
Y6_Obesity <- dateselector(Y6_Obesity, "2017/18") # selecting only 2017/18 values
These are the variables I would like to use to predict childhood obesity:
School Readiness: the percentage of children achieving a good level of development at the end of reception.
Children are defined as having reached a good level of development if they achieve at least the expected level in the early learning goals in the prime areas of learning and the early learning goals in the specific areas of mathematics and literacy.
# School Readiness
# (Indicator ID:90631)
School_Readiness <- datacollectionsex(90631, 102) # selecting the data
School_Readiness <- dateselector(School_Readiness, "2012/13") # selecting only 2012/13 values
Children with one or more decayed, missing or filled teeth at age 3
# Tooth Decay Data
# (Indicator ID:92501)
Tooth_Decay <- datacollection(92501, 102) # selecting the data
Tooth_Decay <- dateselector(Tooth_Decay, "2012/13") # selecting only 2012/13 values
# Imputting missing values manually using corresponding values for the Area
Tooth_Decay <- Tooth_Decay %>%
# London
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Greenwich", 0.42371163)) %>%
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Bexley", 0.42371163)) %>%
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Waltham Forest", 0.42371163)) %>%
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Islington", 0.42371163)) %>%
# South East
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "East Sussex", 0.27081672)) %>%
# East Midlands
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Lincolnshire", 0.43444082)) %>%
# South West
mutate(dmft_in_three_year_olds = replace(dmft_in_three_year_olds, areaname == "Bath and North East Somerset", 0.30771022))
Deprivation score (IMD 2015) (measures of multiple deprivation at the small area level)
# Deprivation Score
# (Indicator ID:91872)
Deprivation_Score <- datacollection(91872, 102) # selecting the data
Deprivation_Score <- dateselector(Deprivation_Score, "2015") # selecting 2015 values
Persistent absentees - Primary school Percentage of primary school enrolments classed as persistent absentees (defined as missing 10% or more of possible sessions).
# Persistent Absentees
# (Indicator ID:92563)
Persistent_Absentees <- datacollection(92563, 102) # selecting the data
Persistent_Absentees <- dateselector(Persistent_Absentees, "2014/15") # selecting 2014/15 values
Children in low income families aged 5 to 10 - The number of children aged between 5 and 10 in a family which is in receipt of Working Tax Credits, Child Tax Credits, Income Support, or Jobseekers Allowance.
# Low Income Familes
# (Indicator ID:92479)
Low_Income_Families <- datacollectionage(92479, 102, "5-10 yrs") # selecting the data
Low_Income_Families <- dateselector(Low_Income_Families, "2013") # selecting 2013 values
Admissions for diabetes for children and young people aged under 19 years
# Diabetes
# (Indicator ID:92622)
Diabetes <- datacollection(92622, 102) # selecting the data
Diabetes <- dateselector(Diabetes, "2016/17") # selecting 2016/17 values
# Replacing missing value for Derby with 2015/16 value
Diabetes <- Diabetes %>%
mutate(admissions_for_diabetes_for_children_and_young_people_aged_under_19_years = replace(admissions_for_diabetes_for_children_and_young_people_aged_under_19_years, areaname == "Derby", 46.80666))
School pupils with social, emotional and mental health needs: % of school pupils with social, emotional and mental health needs (Primary school age)
# Social Health Needs
# (Indicator ID:91871)
Social_Health_Needs <- datacollectionage(91871, 102, "Primary school age") # selecting the data
Social_Health_Needs <- dateselector(Social_Health_Needs, "2018") # selecting 2018 values
Free school meals: % uptake among all pupils (Primary school age)
# Free school meals
# (Indicator ID: 90922)
FreeSchoolMeals <- datacollectionage(90922, 102, "Primary school age") # selecting the data
FreeSchoolMeals <- dateselector(FreeSchoolMeals, "2013") # selecting 2013 values
Obesity: QOF prevalence (18+): Patients aged 18 and over with a BMI of 30 or above
# Obesity 18+
#(Indicator ID: 92588)
Obesity18plus <- datacollection(92588, 102) # selecting the data
Obesity18plus <- dateselector(Obesity18plus, "2017/18") # selecting 2017/18 values
Gender Pay equality - Gross median hourly pay, excluding overtime, for women
# Gender Pay Equality
# (Indicator ID: 92817)
GenderPayEquality <- datacollection(92817, 102) # selecting the data
GenderPayEquality <- GenderPayEquality[-3] # removing time period column as only 1 timeperiod available
# Replace missing value for Kensington and Chelsea with average of Hammersmith and Fulham and Westminster
# calculating the average (89.37865+74.79319)/2 = 82.08592
GenderPayEquality <- GenderPayEquality %>%
mutate(gender_pay_equality = replace(gender_pay_equality, areaname == "Kensington and Chelsea", 82.08592))
Percentage of physically active adults
# Physically active adults
# (Indicator ID: 93014)
ActiveAdults <- datacollection(93014, 102) # selecting the data
ActiveAdults <- dateselector(ActiveAdults, "2015/16") # selecting 2015/16 time period
Breastfeeding initiation - Measures the percentage of mothers who give their babies breast milk in the first 48 hours after delivery
# Breastfeeding Initiation
#(Indicator ID: 20201)
Breastfeeding <- datacollection(20201, 102)
# selecting 2010/11, 2011/12 and 2012/13 Data
Breastfeeding1011 <- dateselector(Breastfeeding, "2010/11")
Breastfeeding1112 <- dateselector(Breastfeeding, "2011/12")
Breastfeeding1213 <- dateselector(Breastfeeding, "2012/13")
# merging these together into 1 data frame
Breastfeeding2 <- list(Breastfeeding1011, Breastfeeding1112, Breastfeeding1213) %>%
reduce(left_join, by = "areaname")
# replacing all NA values for 2010/11 with the value for 2011/12
Breastfeeding2 <- within(Breastfeeding2, breastfeeding_initiation.x <- ifelse(is.na(breastfeeding_initiation.x), breastfeeding_initiation.y, breastfeeding_initiation.x))
# replacting NA value for 2010/11 that was not replaced by 2011/12 data with the 2012/13 data
Breastfeeding2 <- within(Breastfeeding2, breastfeeding_initiation.x <- ifelse(is.na(breastfeeding_initiation.x), breastfeeding_initiation, breastfeeding_initiation.x))
# selcting the newly filled 2010/11 column
Breastfeeding <- select(Breastfeeding2, areaname, breastfeeding_initiation.x)
# removing .x from column name
names(Breastfeeding) <- gsub(".x", "", names(Breastfeeding))
Population vaccination coverage - MMR for one dose (5 years old)
# MMR Vaccination coverage
# (Indicator ID: 30310)
Vaccination <- datacollection(30310, 102) # Selecting the data
Vaccination <- dateselector(Vaccination, "2010/11") # Selecting 2010/11 values
Infant mortality rate
# Infant Mortality Rate
#(Indicator ID: 92196)
Mortality <- datacollection(92196, 102) # Selecting the data
Mortality <- dateselector(Mortality, "2007 - 09") # Selecting the time period
The percentage of the total resident population who are 0-15 years of age
# Resident Population Aged 0 - 15
# (Indicator ID: 93084)
residentpopulation <- datacollection(93084, 102) # selecting the data
residentpopulation <- residentpopulation[-3] # removing time period column as only 2017 values available
Access to woodland: Percentage of the population in each local authority that has accessible woodland of at least 2 hectare within 500 metres of where they live
# Woodland Access
#(Indicator ID: 92814)
Woodland <- datacollection(92814, 102) # collecting the data
Woodland <- Woodland[-3] # removing time period column
# Barking & Dagenham = Average of Bexley, Greenwich, Havering, Newham and Redbridge
# (4.57+25.78+14.74+11.97+24.41)/5 = 16.294
# Islington = Average of Camden, Hackney and Haringey
# (5.95+9.87+8.66)/3 = 8.16
Woodland <- Woodland %>%
mutate(access_to_woodland = replace(access_to_woodland, areaname == "Barking and Dagenham", 16.294)) %>%
mutate(access_to_woodland = replace(access_to_woodland, areaname == "Islington", 8.16))
Admissions for asthma for children aged 0 to 9
# Asthma Admissions
# (Indicator ID: 92481)
Asthma <- datacollection(92481, 102) # collecting the data
Asthma1617 <- dateselector(Asthma, "2016/17") # selecting 2016/17 data
Asthma1516 <- dateselector(Asthma, "2015/16") # selecting 2015/16 data
Asthma2 <- list(Asthma1617, Asthma1516) %>% # joining together both time values
reduce(left_join, by = "areaname")
# replacing na in 2016/17 data with 2015/16 data
Asthma2 <- within(Asthma2, admissions_for_asthma_for_children_aged_0_to_9.x <- ifelse(is.na(admissions_for_asthma_for_children_aged_0_to_9.x), admissions_for_asthma_for_children_aged_0_to_9.y, admissions_for_asthma_for_children_aged_0_to_9.x))
# selecting filled column
Asthma <- select(Asthma2, areaname, admissions_for_asthma_for_children_aged_0_to_9.x)
names(Asthma) <- gsub(".x", "", names(Asthma)) # removing .x from column name
Admissions for gastroenteritis in infants aged 2, 3 and 4 years
# Gastroenteritis Admissions
# (Indicator ID: 92248)
gastroenteritis <- datacollection(92248, 102) # collecting the data
gastroenteritis1617 <- dateselector(gastroenteritis, "2016/17") # selecting 2016/17 values
gastroenteritis1516 <- dateselector(gastroenteritis, "2015/16") # selecting 2015/16 values
gastroenteritis2 <- list(gastroenteritis1617, gastroenteritis1516) %>% # joining together both time values
reduce(left_join, by = "areaname")
# replacing na in 2016/17 data with 2015/16 data
gastroenteritis2 <- within(gastroenteritis2, admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x <- ifelse(is.na(admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x), admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.y, admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x))
# selecting filled column
gastroenteritis <- select(gastroenteritis2, areaname, admissions_for_gastroenteritis_in_infants_aged_23_and_4_years.x)
names(gastroenteritis) <- gsub("_in_infants_aged_23_and_4_years.x", "", names(gastroenteritis)) # renaming column
Admissions for respiratory tract infections in infants aged under 1 year
# Admissions for Respiratory Tract Infections
# (Indicator ID: 92253)
respiratory <- datacollection(92251, 102) # selecting the data
respiratory <- dateselector(respiratory, "2016/17") # selecting 2016/17 time period
# Missing Data for Nottingham and Nottinghamshire
# Nottingham
respiratory <- respiratory %>%
mutate(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year = replace(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, areaname == "Nottingham", 657.6448)) %>%
# Nottinghamshire
mutate(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year = replace(admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, areaname == "Nottinghamshire", 642.7746))
Parent Name
# Selecting the Parent Name
Parent_Name <- fingertips_data(IndicatorID = 92814) %>%
filter(AreaType == "County & UA") %>%
select(AreaName, ParentName)
names(Parent_Name) <- tolower(names(Parent_Name))
# Merging the Data
EDA_Data <-
list(
Parent_Name,
Reception_Obesity,
Y6_Obesity,
Deprivation_Score,
Persistent_Absentees,
Low_Income_Families,
School_Readiness,
Social_Health_Needs,
Diabetes,
Tooth_Decay,
Obesity18plus,
ActiveAdults,
Asthma,
Breastfeeding,
gastroenteritis,
Vaccination,
Mortality,
respiratory,
Woodland,
residentpopulation,
GenderPayEquality
) %>%
purrr::reduce(left_join, by = "areaname")
Adding the London Indicator
# London Indicator
EDA_Data <- mutate(EDA_Data, london = ifelse(EDA_Data$parentname == "London region", 1, 0))
# List of all Inner London Boroughs
central <- c("Camden", "Greenwich", "Hackney", "Hammersmith and Fulham", "Islington", "Kensington and Chelsea", "Lambeth", "Lewisham", "Southwark", "Tower Hamlets", "Wandsworth", "Westminster" )
# Central London Indicator
EDA_Data <- mutate(EDA_Data, inner_london = ifelse(is.element(EDA_Data$areaname, central), 1, 0))
# Replacing Missing value for Buckinghamshire prevalence of Obesity (18+) with South East Value of 8.313363
EDA_Data <- EDA_Data %>%
mutate(obesity_qof_prevalence = replace(obesity_qof_prevalence, areaname == "Buckinghamshire", 8.313363))
# Removing City of London, Isles of Scilly and Torbay as they do not contain values for Childhood Obesity levels at age year 6 or reception.
EDA_Data <- subset(EDA_Data, areaname != "City of London")
EDA_Data <- subset(EDA_Data, areaname != "Isles of Scilly")
EDA_Data <- subset(EDA_Data, areaname != "Torbay")
# Removing Rutland as more than half of the data values are missing
EDA_Data <- subset(EDA_Data, areaname != "Rutland")
# Simplifying Wording for parentname
EDA_Data$parentname <- removeWords(EDA_Data$parentname, "region")
EDA_Data$parentname <- removeWords(EDA_Data$parentname, " and the Humber")
EDA_Data$parentname <- as.factor(EDA_Data$parentname)
# plotting reception against year 6 prevalence of obesty
plot1 <- ggplot(EDA_Data, aes(x = reception_prevalence_of_obesity, y = year_6_prevalence_of_obesity, text = areaname)) +
geom_point() +
labs(title = "Prevalence of Obesity at Reception (Age 4/5) and Year 6 (Age 10/11)") +
xlab("Reception: Prevalence of Obesity 2010/11") +
ylab("Year 6: Prevalence of Obesity 2017/18") +
theme(plot.title = element_text(hjust = 0.5))
ggplotly(plot1)
# Plotting England Year 6 Prevalence of Obesity
ggplot(EDA_Data, aes(x = "", y = year_6_prevalence_of_obesity)) +
stat_boxplot(geom = "errorbar", width = 0.1) +
geom_boxplot(width = 0.4, outlier.color = "red", outlier.shape = 1) +
geom_jitter(width = 0.15, size = 1) +
labs(title = "National Spread of Prevalence of Obesity (Age 10/11)") +
xlab("England") +
ylab("Year 6 Prevalence of Obesity") +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
# Plotting regional variation in Year 6 Prevalence of Obesity
ggplot(EDA_Data, aes(x = parentname, y = year_6_prevalence_of_obesity)) +
stat_boxplot(geom = "errorbar") +
geom_boxplot(outlier.color = "red", outlier.shape = 1) +
labs(title = "Regional Spread of Prevalence of Obesity (Age 10/11)") +
xlab("Region") +
ylab("Year 6 Prevalence of Obesity") +
theme(plot.title = element_text(hjust = 0.5)) +
coord_flip()
Correlation Plot between the variables
# Calculating Correlation Matrix
correlation <- cor((select(EDA_Data, -"areaname", -"parentname", -"reception_prevalence_of_obesity", -"year_6_prevalence_of_obesity", -"london", -"inner_london")), use = "complete.obs")
# Using Custom Labels as the variable names were too long
colnames(correlation) <- c("Deprivation Score", "Persistent Abesentees", "Low Income Families", "School Readiness", "Social Health Needs", "Diabetes Admissions (0-9", "Decayed/Missing/Filled Teeth", "Obesity in 18+", "Physically Active Adults", "Asthma Admissions (0-9)", "Breastfeeding Initiation", "Gastroenteritis (2-4)", "MMR Vaccination (5)", "Child Mortality Rate", "Respiratory Infections (2-4)", "Woodland Access", "% Population Age 0-15", "Gender Pay Equity")
rownames(correlation) <- c("Deprivation Score", "Persistent Abesentees", "Low Income Families", "School Readiness", "Social Health Needs", "Diabetes Admissions (0-9", "Decayed/Missing/Filled Teeth", "Obesity in 18+", "Physically Active Adults", "Asthma Admissions (0-9)", "Breastfeeding Initiation", "Gastroenteritis (2-4)", "MMR Vaccination (5)", "Child Mortality Rate", "Respiratory Infections (2-4)", "Woodland Access", "% Population Age 0-15", "Gender Pay Equity")
# Plotting Correlation
corrplot(correlation, method = "circle", type = "upper", tl.cex = 0.75, tl.col = "black")
Finding highest correlations between variables
# Shows biggest correlations
correlation[upper.tri(correlation, diag = TRUE)] <- NA # sets upper half of correlation matrix to NA to avoid duplicates
m <- melt(correlation) # collapsing the dataframe
m <- m[order(-abs(m$value)), ] # ordering by correlation value (largest to smallest)
m <- na.omit(m) # removing NAs
head(m) # shows the top of the dataframe
## Var1 Var2 value
## 3 Low Income Families Deprivation Score 0.9540741
## 137 Breastfeeding Initiation Obesity in 18+ -0.7962584
## 213 Respiratory Infections (2-4) Gastroenteritis (2-4) 0.7850130
## 21 Low Income Families Persistent Abesentees 0.7413458
## 2 Persistent Abesentees Deprivation Score 0.7262395
## 192 Gastroenteritis (2-4) Breastfeeding Initiation -0.5902976
This table shows the 6 largest correlation relationships
While low income families is more closely related to children in an area, I chose to remove this predictive variable in favour of Deprivation score as this encompassed more measures of deprivation in more detail than the low income families indicator.
# plotting breastfeeding initiation against adult obesity
ggplot(EDA_Data, aes(x = breastfeeding_initiation,
y = obesity_qof_prevalence)) +
geom_point() + # adding points
xlab("Breastfeeding Initiation") + # adding x axis label
ylab("Obesity 18+") + # adding y axis label
stat_smooth(method = "lm", col = "darkblue", se = F) # adding linear model line
Breastfeeding Initiation and Population MMR Vaccination Coverage
# Plotting breastfeeding initiation against MMR vaccination coverage
ggplot(EDA_Data, aes(x = (breastfeeding_initiation), # plotting breastfeeding initiation on x axis
y = population_vaccination_coverage_mmr_for_one_dose)) + # plotting MMR Vaccination on y axis
geom_point(aes(color = parentname)) + # colouring the points by region
labs(color = "Region") + # adding the ledgend
xlab("Breastfeeding Initiation") + # x axis label
ylab("MMR Vaccination Population Coverage") # y axis label
# plotting regional spread of MMR Vaccination coverage
ggplot(EDA_Data, aes(x = parentname, y = population_vaccination_coverage_mmr_for_one_dose)) + # plotting Vaccination rates against Region
stat_boxplot(geom = "errorbar") + # adding end bars
geom_boxplot(outlier.color = "red", outlier.shape = 1) + # colouring outliers red circles
xlab("Region") + # X axis label
ylab("MMR Vaccination") + # Y axis label
coord_flip() # flipping the axis
Revisiting the aims:
Identify the most influencing factors for childhood obesity for children in in Year 6 (age 10/11 in 2017/18)
How do these vary by local area?*
Do this by creating an explanatory linear model to determine the most important predictors for modelling Year 6 obesity
Determine if different factors effect the prevalence of obesity at Reception for the same cohort of children (age 5/6 in 2010/11) within the same local area
Are these different and does this explain the increase in childhood obesity from Reception to Year 6?*
Do this by creating another explanatory linear model for reception in 2010/11 and comparing the important predictors with the previous model
#Selecting the variables to model (ignoring Region and variables removed earlier in EDA)
model_data <- select(EDA_Data, -parentname, -reception_prevalence_of_obesity, -population_vaccination_coverage_mmr_for_one_dose, -children_in_low_income_families_aged_5_to_10, -london) # selecting data to model
# Inital Linear Model including all the quantitative predictors except MMR Vaccination coverage, Reception obesity levels and low income families
Model1 <- lm(year_6_prevalence_of_obesity ~ . - areaname, data = model_data)
VIF (Variance Inflation Factor) Test for multicolinearity
# VIF of Model 1 - Check for multicolinearity
car::vif(Model1)
## deprivation_score
## 5.367664
## persistent_absentees_primary_school
## 2.686040
## school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception
## 1.283347
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`
## 1.311034
## admissions_for_diabetes_for_children_and_young_people_aged_under_19_years
## 1.427380
## dmft_in_three_year_olds
## 1.844987
## obesity_qof_prevalence
## 3.592104
## percentage_of_physically_active_adults
## 2.406471
## admissions_for_asthma_for_children_aged_0_to_9
## 2.166784
## breastfeeding_initiation
## 4.758843
## admissions_for_gastroenteritis
## 3.621918
## infant_mortality
## 1.967503
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year
## 3.203037
## access_to_woodland
## 1.278087
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`
## 1.744322
## gender_pay_equality
## 1.610330
## inner_london
## 1.995328
Forwards and Backwards Stepwise Regression using AIC
# Stepwise model selection using AIC
MASS::stepAIC(Model1, direction = "both", trace = FALSE)
##
## Call:
## lm(formula = year_6_prevalence_of_obesity ~ deprivation_score +
## obesity_qof_prevalence + percentage_of_physically_active_adults +
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year +
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` +
## inner_london, data = model_data)
##
## Coefficients:
## (Intercept)
## 17.717987
## deprivation_score
## 0.247947
## obesity_qof_prevalence
## 0.258372
## percentage_of_physically_active_adults
## -0.185006
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year
## -0.002349
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`
## 0.410762
## inner_london
## 3.173306
Exhaustive Search using regsubsets for the best model with 1-16 predictors
# performring model selection using regsubsets and an exhaustive search
# This finds the best model for each number of predictors between 1 and 17 (full model)
aoutput <- (regsubsets(year_6_prevalence_of_obesity ~ . - areaname, data = model_data, nvmax = 17, method = "exhaustive"))
a <- summary(aoutput)
Plotting BIC for each model
# BIC plot for models selected using regsubsets
plot(a$bic)
Selecting the 6 variable model using regsubsets output
- The 6 variable model gave the lowest BIC value - The predictive variables chosen in the 6 variable model can be found by selecting the variables with a * in row 6
# summary of output produced by regsubsets
summary(aoutput)
## Subset selection object
## Call: regsubsets.formula(year_6_prevalence_of_obesity ~ . - areaname,
## data = model_data, nvmax = 17, method = "exhaustive")
## 17 Variables (and intercept)
## Forced in
## deprivation_score FALSE
## persistent_absentees_primary_school FALSE
## school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception FALSE
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs` FALSE
## admissions_for_diabetes_for_children_and_young_people_aged_under_19_years FALSE
## dmft_in_three_year_olds FALSE
## obesity_qof_prevalence FALSE
## percentage_of_physically_active_adults FALSE
## admissions_for_asthma_for_children_aged_0_to_9 FALSE
## breastfeeding_initiation FALSE
## admissions_for_gastroenteritis FALSE
## infant_mortality FALSE
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year FALSE
## access_to_woodland FALSE
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` FALSE
## gender_pay_equality FALSE
## inner_london FALSE
## Forced out
## deprivation_score FALSE
## persistent_absentees_primary_school FALSE
## school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception FALSE
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs` FALSE
## admissions_for_diabetes_for_children_and_young_people_aged_under_19_years FALSE
## dmft_in_three_year_olds FALSE
## obesity_qof_prevalence FALSE
## percentage_of_physically_active_adults FALSE
## admissions_for_asthma_for_children_aged_0_to_9 FALSE
## breastfeeding_initiation FALSE
## admissions_for_gastroenteritis FALSE
## infant_mortality FALSE
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year FALSE
## access_to_woodland FALSE
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` FALSE
## gender_pay_equality FALSE
## inner_london FALSE
## 1 subsets of each size up to 17
## Selection Algorithm: exhaustive
## deprivation_score persistent_absentees_primary_school
## 1 ( 1 ) "*" " "
## 2 ( 1 ) "*" " "
## 3 ( 1 ) "*" " "
## 4 ( 1 ) "*" " "
## 5 ( 1 ) "*" " "
## 6 ( 1 ) "*" " "
## 7 ( 1 ) "*" " "
## 8 ( 1 ) "*" " "
## 9 ( 1 ) "*" " "
## 10 ( 1 ) "*" " "
## 11 ( 1 ) "*" " "
## 12 ( 1 ) "*" "*"
## 13 ( 1 ) "*" "*"
## 14 ( 1 ) "*" "*"
## 15 ( 1 ) "*" "*"
## 16 ( 1 ) "*" "*"
## 17 ( 1 ) "*" "*"
## school_readiness_the_percentage_of_children_achieving_a_good_level_of_development_at_the_end_of_reception
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## 12 ( 1 ) " "
## 13 ( 1 ) " "
## 14 ( 1 ) " "
## 15 ( 1 ) " "
## 16 ( 1 ) " "
## 17 ( 1 ) "*"
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) "*"
## 11 ( 1 ) "*"
## 12 ( 1 ) "*"
## 13 ( 1 ) "*"
## 14 ( 1 ) "*"
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## admissions_for_diabetes_for_children_and_young_people_aged_under_19_years
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) "*"
## 9 ( 1 ) "*"
## 10 ( 1 ) "*"
## 11 ( 1 ) "*"
## 12 ( 1 ) "*"
## 13 ( 1 ) "*"
## 14 ( 1 ) "*"
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## dmft_in_three_year_olds obesity_qof_prevalence
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " "*"
## 7 ( 1 ) " " "*"
## 8 ( 1 ) " " "*"
## 9 ( 1 ) " " "*"
## 10 ( 1 ) " " "*"
## 11 ( 1 ) "*" "*"
## 12 ( 1 ) "*" "*"
## 13 ( 1 ) "*" "*"
## 14 ( 1 ) "*" "*"
## 15 ( 1 ) "*" "*"
## 16 ( 1 ) "*" "*"
## 17 ( 1 ) "*" "*"
## percentage_of_physically_active_adults
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) "*"
## 4 ( 1 ) "*"
## 5 ( 1 ) "*"
## 6 ( 1 ) "*"
## 7 ( 1 ) "*"
## 8 ( 1 ) "*"
## 9 ( 1 ) "*"
## 10 ( 1 ) "*"
## 11 ( 1 ) "*"
## 12 ( 1 ) "*"
## 13 ( 1 ) "*"
## 14 ( 1 ) "*"
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## admissions_for_asthma_for_children_aged_0_to_9
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## 12 ( 1 ) " "
## 13 ( 1 ) " "
## 14 ( 1 ) "*"
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## breastfeeding_initiation admissions_for_gastroenteritis
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " " "
## 4 ( 1 ) " " " "
## 5 ( 1 ) " " " "
## 6 ( 1 ) " " " "
## 7 ( 1 ) " " " "
## 8 ( 1 ) " " " "
## 9 ( 1 ) " " "*"
## 10 ( 1 ) " " "*"
## 11 ( 1 ) " " "*"
## 12 ( 1 ) " " "*"
## 13 ( 1 ) "*" "*"
## 14 ( 1 ) "*" "*"
## 15 ( 1 ) "*" "*"
## 16 ( 1 ) "*" "*"
## 17 ( 1 ) "*" "*"
## infant_mortality
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## 12 ( 1 ) " "
## 13 ( 1 ) " "
## 14 ( 1 ) " "
## 15 ( 1 ) " "
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) "*"
## 6 ( 1 ) "*"
## 7 ( 1 ) "*"
## 8 ( 1 ) "*"
## 9 ( 1 ) "*"
## 10 ( 1 ) "*"
## 11 ( 1 ) "*"
## 12 ( 1 ) "*"
## 13 ( 1 ) "*"
## 14 ( 1 ) "*"
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## access_to_woodland
## 1 ( 1 ) " "
## 2 ( 1 ) " "
## 3 ( 1 ) " "
## 4 ( 1 ) " "
## 5 ( 1 ) " "
## 6 ( 1 ) " "
## 7 ( 1 ) " "
## 8 ( 1 ) " "
## 9 ( 1 ) " "
## 10 ( 1 ) " "
## 11 ( 1 ) " "
## 12 ( 1 ) " "
## 13 ( 1 ) " "
## 14 ( 1 ) " "
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`
## 1 ( 1 ) " "
## 2 ( 1 ) "*"
## 3 ( 1 ) " "
## 4 ( 1 ) "*"
## 5 ( 1 ) "*"
## 6 ( 1 ) "*"
## 7 ( 1 ) "*"
## 8 ( 1 ) "*"
## 9 ( 1 ) "*"
## 10 ( 1 ) "*"
## 11 ( 1 ) "*"
## 12 ( 1 ) "*"
## 13 ( 1 ) "*"
## 14 ( 1 ) "*"
## 15 ( 1 ) "*"
## 16 ( 1 ) "*"
## 17 ( 1 ) "*"
## gender_pay_equality inner_london
## 1 ( 1 ) " " " "
## 2 ( 1 ) " " " "
## 3 ( 1 ) " " "*"
## 4 ( 1 ) " " "*"
## 5 ( 1 ) " " "*"
## 6 ( 1 ) " " "*"
## 7 ( 1 ) "*" "*"
## 8 ( 1 ) "*" "*"
## 9 ( 1 ) "*" "*"
## 10 ( 1 ) "*" "*"
## 11 ( 1 ) "*" "*"
## 12 ( 1 ) "*" "*"
## 13 ( 1 ) "*" "*"
## 14 ( 1 ) "*" "*"
## 15 ( 1 ) "*" "*"
## 16 ( 1 ) "*" "*"
## 17 ( 1 ) "*" "*"
Fitting the chosen model
# Fitting the model chosen using regsubsets minimising BIC
modely6 <- lm(year_6_prevalence_of_obesity ~ deprivation_score + `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + inner_london + obesity_qof_prevalence + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year + percentage_of_physically_active_adults, data = model_data)
# Model Summary
summary(modely6)
##
## Call:
## lm(formula = year_6_prevalence_of_obesity ~ deprivation_score +
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` +
## inner_london + obesity_qof_prevalence + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year +
## percentage_of_physically_active_adults, data = model_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1862 -1.2957 0.0226 1.2730 4.3484
##
## Coefficients:
## Estimate
## (Intercept) 17.717987
## deprivation_score 0.247947
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 0.410762
## inner_london 3.173306
## obesity_qof_prevalence 0.258372
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year -0.002349
## percentage_of_physically_active_adults -0.185006
## Std. Error
## (Intercept) 4.707109
## deprivation_score 0.024989
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 0.084174
## inner_london 0.696160
## obesity_qof_prevalence 0.093076
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 0.000788
## percentage_of_physically_active_adults 0.050373
## t value
## (Intercept) 3.764
## deprivation_score 9.922
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 4.880
## inner_london 4.558
## obesity_qof_prevalence 2.776
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year -2.981
## percentage_of_physically_active_adults -3.673
## Pr(>|t|)
## (Intercept) 0.000245
## deprivation_score < 2e-16
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 2.83e-06
## inner_london 1.11e-05
## obesity_qof_prevalence 0.006252
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 0.003386
## percentage_of_physically_active_adults 0.000340
##
## (Intercept) ***
## deprivation_score ***
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` ***
## inner_london ***
## obesity_qof_prevalence **
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year **
## percentage_of_physically_active_adults ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.794 on 141 degrees of freedom
## Multiple R-squared: 0.7713, Adjusted R-squared: 0.7615
## F-statistic: 79.24 on 6 and 141 DF, p-value: < 2.2e-16
Chosen Indicators
1. deprivation score - positive correlation
The more deprived an area is, the higher their levels of year 6 obesity
2. percentage of the total resident population aged 0-15 - positive correlation
The larger the percentage of children aged 0-15 living in a local area, the higher their levels of year 6 obesity
3. ‘London Factor’ - positive correlation
If a child is living in London, they are more likely to be obese
4. Physically active adults - negative correlation
The larger the percentage of physically active adults, the lower the levels of childhood obesity 5. Admissions for respiratory tract infections in infants under 1 year old - negative correlation
The higher the admissions for respiratory tract infections in infants in a local area, the lower the levels of childhood obesity
6. % Obesity 18+ - positive correlation
The larger the percentage of obese adults in the local area, the higher the level of year 6 obesity in the same area
Additional Indicators using AIC:
# Model plots
(plot(modely6))
## NULL
Areas of Interest
1 area that the model is significantly overfitting (predicting higher levels of obesity than the true value)
- 72: Barnsley 2 areas the model is significantly underfitting - 82: Dudley - 116: Merton
# Selecting the Data
model_data2 <- select(EDA_Data, -parentname, -year_6_prevalence_of_obesity, -children_in_low_income_families_aged_5_to_10, -population_vaccination_coverage_mmr_for_one_dose, -london)
# Fitting the full model
Model3 <- lm(reception_prevalence_of_obesity ~ . - areaname, data = model_data2)
# Model selection using forwards/backwards stepwise regression and AIC
MASS::stepAIC(Model3, direction = "both", trace = FALSE)
##
## Call:
## lm(formula = reception_prevalence_of_obesity ~ deprivation_score +
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs` +
## obesity_qof_prevalence + percentage_of_physically_active_adults +
## breastfeeding_initiation + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year +
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` +
## inner_london, data = model_data2)
##
## Coefficients:
## (Intercept)
## 4.0917442
## deprivation_score
## 0.0977588
## `school_pupils_with_socialemotional_and_mental_health_needs_%_of_school_pupils_with_socialemotional_and_mental_health_needs`
## 0.2990894
## obesity_qof_prevalence
## 0.1940950
## percentage_of_physically_active_adults
## -0.0558367
## breastfeeding_initiation
## 0.0346537
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year
## -0.0009793
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age`
## 0.1190515
## inner_london
## 1.6126184
The Stepwise Selection Model has 8 predictors:
# exhaustive search using regsubsets
aoutput2 <- (regsubsets(reception_prevalence_of_obesity ~ . - areaname, data = model_data2, nvmax = 16, method = "exhaustive"))
# summary of the output, use this to select models
a2 <- summary(aoutput2)
Plotting BIC
# BIC Values for the best model with each number of variables
plot(a2$bic)
Chosen Variables
1. Deprivation Score - Positive relationship
2. London Factor - Positive relationship
3. % of population aged 0-15 4. Breastfeeding at Birth 5. Obesity in Adults 18+ - Positive relationship 6. Admissions for Respiratory Tract Infections using AIC, additional 2 variables:
Chosen Model
# Reception Model Chosen with regsubsets using BIC
Model4 <- lm(reception_prevalence_of_obesity ~ deprivation_score + obesity_qof_prevalence + inner_london + `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` + breastfeeding_initiation + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year, data = model_data2)
summary(Model4)
##
## Call:
## lm(formula = reception_prevalence_of_obesity ~ deprivation_score +
## obesity_qof_prevalence + inner_london + `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` +
## breastfeeding_initiation + admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year,
## data = model_data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9144 -0.7459 0.0329 0.7097 3.7761
##
## Coefficients:
## Estimate
## (Intercept) 0.0725200
## deprivation_score 0.1122737
## obesity_qof_prevalence 0.2272590
## inner_london 1.4587311
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 0.1401095
## breastfeeding_initiation 0.0350890
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year -0.0010747
## Std. Error
## (Intercept) 1.7274942
## deprivation_score 0.0148745
## obesity_qof_prevalence 0.0649953
## inner_london 0.4318203
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 0.0471724
## breastfeeding_initiation 0.0146496
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 0.0004835
## t value
## (Intercept) 0.042
## deprivation_score 7.548
## obesity_qof_prevalence 3.497
## inner_london 3.378
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 2.970
## breastfeeding_initiation 2.395
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year -2.223
## Pr(>|t|)
## (Intercept) 0.966574
## deprivation_score 5e-12
## obesity_qof_prevalence 0.000631
## inner_london 0.000944
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` 0.003500
## breastfeeding_initiation 0.017924
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year 0.027817
##
## (Intercept)
## deprivation_score ***
## obesity_qof_prevalence ***
## inner_london ***
## `percentage_of_the_total_resident_population_who_are_0-15_years_of_age` **
## breastfeeding_initiation *
## admissions_for_respiratory_tract_infections_in_infants_aged_under_1_year *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.086 on 141 degrees of freedom
## Multiple R-squared: 0.5765, Adjusted R-squared: 0.5585
## F-statistic: 32 on 6 and 141 DF, p-value: < 2.2e-16
# model plots
plot(Model4)
Points of interest
38: Reading
102: Enfield
112: Richmond upon Thames
Reception
1. Deprivation Score
2. Inner London Indicator
3. % of population aged 0-15
4. Breastfeeding at Birth
5. Obesity in Adults 18+ - Positive relationship
6. Admissions for Respiratory Tract Infections
Year 6
1. deprivation score
2. % of population aged 0-15
3. Inner London Indicator
4. Physically active adults
5. Admissions for Respiratory Tract Infections
6. Admissions for Respiratory Tract Infections
The Reception model has an adjusted r squared value of 0.559 for the 6 variable model where as the year 6 model has an adjusted r squared value of 0.762 for the same number of variables. This shows that the Year 6 model is fitting the data much better than the reception model.
Would like to repeat the model for 2016/17 Y6 and 2009/10 Reception and see if time makes a difference
‘ignoring’ time when selecting the explanatory variables (how much of a time delay is there in certain variables affecting others?)
Introduce more predictor variables or different time choices more suited to modelling reception children
More/different predictors, not just ones on fingertips (e.g. number of fast food resturants in local area, average shopping basket, london affect)
Compare specific children through time